DLT - an industrial R&D project for multilingual machine translation
نویسنده
چکیده
An overview of the DLT (Distributed Language Translation) project is given. This project is aimed at a new, multilingual MT system in the 1990s, which uses Esperanto as an internal interlingua. The system's ,architectural features, current progress and project organization are dealt with. 1. I n t r o d u c t i o n DLT (Distributed Language Translation). is the name of a principle, a design philosophy and a project. Within the area of MT, it represents another approach for steering between the hazards of low-quality output, endless prolongation of research and development time, restriction to narrowlybounded subject fields, the geometric cost expansion when a new language is added, etc. DLT is a concentrated high-tech effort to attain a product line of language translation modules in the 1990s. Together, these modules will constitute an interactive, knowledge-based, multilingual translation system, perfecdy suited for operation on networked desk-top equipment. DLT was conceived in 1979, in an environment with no historical ties to MT whatsoever. After patents had been applied for in 14 countries, the first publication followed at the conference on "New Systems and Services in Telecommunications" in Liege [ 1980]. In 1982, the EEC granted a quarter of a million guilders for a DLT Feasibility Study, which was completed in 1983. A remarkable feature of the DLT design, highlighted in this study, was the use of Esperanto as intermediate language, with its own lexicon. This meant the adoption of an overall interlingual architecture, the most ambitious structure known for an MT system. At the same time, the introduction of Esperanto into the MT scene of the 1980s aroused a lot of skepticism and prejudice. As it happens, this semi-artificial language (invented by an ophthalmo-logist towards the end of the nineteenth century) is not usually considered a respectable object of study among professional linguists. 2. D e s i g n p h i l o s o p h y The research team at BSO considers Esperanto a valuable tool in language technology, and has motivated its use as the DLT pivot on rigorous systems engineering grounds: an overall interlingual architecture, i.e. an MT process of 2 main steps (instead of 3) fits extremely well into the outside operating environment, which consists of 'senders' and 'receivers' linked by a communications network; the interlingua (or Intermediate Language) is the 'semi-product' passed over the network, and should be independent of any source or target language in the system; the knowledge-based component of the translation process, the world-knowledge inferencing system for resolving ambiguities is essentially language-independent and can therefore entirely be built in the interlingua; serving a multilingual system, this is an important economy-of-scale consideration; long-term development and maintenance of a complex translation and world knowledge system is a task that can only succeed with perfect man-machine interfaces for the system engineers; linguists, lexicographers, terminologists and other specialists must be offered quick and easy access to the heart of the translation machinery; this calls for an interlingua that is directlY,legible; at the same time, the interlingua should be lexicologically autonomous and well-defined, the former eliminating the need for re-paraphrasing in other languages, the latter being a prerequisite for distributed system development (language teams working to and from one common interlingua); Esperanto meets these requirements. 3. P r o t o t y p e c o n s t r u c t i o n In 1984, BSO set up a plan for a 6-year research and development project (75 person-years at the cost of 18 million guilders), aimed a t a DLT prototype capable of translating at least one language pair (English-French). This plan received the su0port of the Ministry of Economic Affairs of the Netherlands, which granted an innovation subsidy of 8 million guilders. The first half of this 6-year schedule has now been completed. A first prototype of DLT was shown to the press in December 1987. Though operating only slowly as yet, with a small vocabulary (2000 English words) and a restricted grammar, this laboratory model shows the various monolingual and bilingual processing steps of DLT in proper sequence [see also Fig. 1]: 1. Exhaustive parsing of the English source text. Two different parser implementations have been realized in the search for the fastest formalism: one is based on ATNs and BSO's graphic software environment (on SUN 3/50 workstations) developed for setting up, testing and optimizing ATNs, the other is based on APSG and the PARSPAT software system from the University of Amsterdam [Van der Steen, 1987]. The parsing process in DLT is breadth-first, syntaxonly, and delivers dependency (not constituency) trees. 2. Surface translation (first hail). Contrastive syntactic rules between English and Esperanto are applied here. This system of bilingual rules (250 at present) is based upon dependency grammar formalizations of both languages. The methodo-logical framework has been inspired by the work of the French l inguist Tesniere and is comprehensively described in [Schubert, 1987]. Semantic considerations are disregarded systematically at this stage. The result is a (sometimes large) number of 'formally possible' parallel translations.
منابع مشابه
Automatic Multilingual Subtitling in the eTITLE project
This paper presents the Multilingual Translation Service of eTITLE, a European eContent project, which has produced tools to assist in the multilingual subtitling of audiovisual material through the web. The eTITLE Translation Service combines state-of-the-art Machine translation and Translation memories, which may be tailored to the customer needs. The user can choose to use only Machine Trans...
متن کاملA Comprehensive Model for R and D Project Portfolio Selection with Zero-One Linear Goal-Programming (RESEARCH NOTE)
Technology centered organizations must be able to identify promising new products or process improvements at an early stage so that the necessary resources can be allocated to those activities. It is essential to invest in targeted research and development (R and D) projects as opposed to a wide range of ideas so that resources can be focused on successful outcomes. The selection of the most ap...
متن کاملDialogue Modelling for Statistical Machine Translation
The proposed project sets out to improve the quality of machine translation (MT) technology. Machine translation, known by the general public through popular applications such as Google Translate, is defined as the automatic translation from one language to another through a computer algorithm – for instance, translating a text from Japanese to Norwegian or vice-versa. In a globalised world whe...
متن کاملStatistical speech-to-speech translation with multilingual speech recognition and bilingual-chunk parsing
Initiated mainly from speech community, researches in speech to speech (S2S) translation have made steady progress in the past decade. Many approaches to S2S translation have been proposed continually. Among of them, corpus-dependent statistical strategies have been widely studied during recent years. In corpus-based translation methodology, rather than taking the corpus just as reference templ...
متن کاملApplications in Multilingual Machine Translation Applications in Multilingual Machine Translation
The CAT2 Machine Translation System, developed in Saarbr ucken in 1987, is a natural language application coded entirely in Prolog. Since its initial development, several languages have been implemented on an experimental basis to evaluate the translation methodology, the underlying formalism, the linguistic descriptions, and the e ectiveness of the Prolog implementation. Seven years later, it...
متن کامل